Statistical framework for a Spanish spoken dialogue corpus

نویسندگان

  • Carlos D. Martínez-Hinarejos
  • José-Miguel Benedí
  • Ramón Granell
چکیده

Dialogue systems are one of the most interesting applications of speech and language technologies. There have recently been some attempts to build dialogue systems in Spanish, and some corpora have been acquired and annotated. Using these corpora, statistical machine learning methods can be applied to try to solve problems in spoken dialogue systems. In this paper, two statistical models based on the maximum likelihood assumption are presented, and two main applications of these models on a Spanish dialogue corpus are shown: labelling and decoding. The labelling application is useful for annotating new dialogue corpora. The decoding application is useful for implementing dialogue strategies in dialogue systems. Both applications centre on unsegmented dialogue turns. The obtained results show that, although limited, the proposed statistical models are appropriate for these applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating spoken dialogue models under the interactive pattern recognition framework

The new Interactive Pattern Recognition (IPR) framework has been proposed to deal with human-machine interaction. In this context a new formulation has been recently defined to represent a Spoken Dialogue System as an IPR problem. In this work this formulation is applied to define graphical models that deal with Spoken Dialogue Systems. The definition of both a Dialogue Manager and a User Model...

متن کامل

Language Models for Name Recognition in Spanish Spoken Dialogue Systems

Current advances on dialogue system require the development of language models for automatic speech recognition that are not only domain or task specific but also sub-task specific (e.g. name, age or price recognition). This paper presents a method for the creation of language models for name recognition at the greeting stage of a conversation in spoken Spanish. In particular, we focus on the i...

متن کامل

Semi-automatic Domain Ontology Construction from Spoken Corpus in Tunisian Dialect: Railway Request Information

In this paper, we present a hybrid method for semi-automatic building of domain ontology from spoken dialogue corpus in Tunisian Dialect for the railway request information domain. The proposed method is based on a statistical method for term and concept extraction and a linguistic method for semantic relation extraction. This method consists of three fundamental phases, namely the corpus const...

متن کامل

Category-based Language Models in a Spanish Spoken Dialogue System

The main goal of this work is to study if a language model based on categories could improve the performance of a dialogue system application as it does when not spontaneous and bigger English corpora are used. Firstly, several sets of categories, which are generated on the basis of different classification criteria, are obtained. Then, for each criterion, two language models are generated: A l...

متن کامل

An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems

We present an evaluation framework to enable developers of information seeking, transaction based spoken dialogue systems to compare the robustness of natural language understanding (NLU) approaches across varying levels of word error rate and contrasting domains. We develop statistical and semantic parsing based approaches to dialogue act identification and concept retrieval. Voice search is u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2008